Connectionist multivariate density-estimation and its application to speech synthesis

نویسنده

  • Benigno Uria
چکیده

Autoregressive models factorize a multivariate joint probability distribution into a product of one-dimensional conditional distributions. The variables are assigned an ordering, and the conditional distribution of each variable modelled using all variables preceding it in that ordering as predictors. Calculating normalized probabilities and sampling has polynomial computational complexity under autoregressive models. Moreover, binary autoregressive models based on neural networks obtain statistical performances similar to that of some intractable models, like restricted Boltzmann machines, on several datasets. The use of autoregressive probability density estimators based on neural networks to model real-valued data, while proposed before, has never been properly investigated and reported. In this thesis we extend the formulation of neural autoregressive distribution estimators (NADE) to real-valued data; a model we call the real-valued neural autoregressive density estimator (RNADE). Its statistical performance on several datasets, including visual and auditory data, is reported and compared to that of other models. RNADE obtained higher test likelihoods than other tractable models, while retaining all the attractive computational properties of autoregressive models. However, autoregressive models are limited by the ordering of the variables inherent to their formulation. Marginalization and imputation tasks can only be solved analytically if the missing variables are at the end of the ordering. We present a new training technique that obtains a set of parameters that can be used for any ordering of the variables. By choosing a model with a convenient ordering of the dimensions at test time, it is possible to solve any marginalization and imputation tasks analytically. The same training procedure also makes it practical to train NADEs and RNADEs with several hidden layers. The resulting deep and tractable models display higher test likelihoods than the equivalent one-hidden-layer models for all the datasets tested. Ensembles of NADEs or RNADEs can be created inexpensively by combining models that share their parameters but differ in the ordering of the variables. These ensembles of autoregressive models obtain state-of-the-art statistical performances for several datasets. Finally, we demonstrate the application of RNADE to speech synthesis, and

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Density Functional Approach toward Structural Features and Properties of C20 and its Complexes with C2X4, C2X2 (X = H, F, Cl, Br) for Synthesis Application

The complexes between C20 and C2 X4 , C2 X2 (X = H, F, Cl, Br) have been studied theoretically at the B3LYP/6-311G (d,p) level. The calculations include the optimized geometries, the interaction energies, aromaticity and thermodynamic. The interaction energies ranging from -60 to -101 kcal/mol and being ordered as: X = F> Cl > Br. Natural bond orbital (NBO) analysis has been performed on all ge...

متن کامل

Moment Inequalities for Supremum of Empirical Processes of‎ ‎U-Statistic Structure and Application to Density Estimation

We derive moment inequalities for the supremum of empirical processes of U-Statistic structure and give application to kernel type density  estimation ‎and estimation of the distribution function for functions of observations.  

متن کامل

Study on Unit-Selection and Statistical Parametric Speech Synthesis Techniques

One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...

متن کامل

Significance of Maximum Spectral Amplitude in Sub-bands for Spectral Envelope Estimation and Its Application to Statistical Parametric Speech Synthesis

In this paper we propose a technique for spectral envelope estimation using maximum values in the sub-bands of Fourier magnitude spectrum (MSASB). Most other methods in the literature parametrize spectral envelope in cepstral domain such as Mel-generalized cepstrum etc. Such cepstral domain representations, although compact, are not readily interpretable. This difficulty is overcome by our meth...

متن کامل

Connectionist Probability Estimation in HMM Speech Recognition

This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Hervé Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016